18 research outputs found

    Aportación a la extracción paramétrica en reconocimiento de voz robusto basada en la aplicación de conocimiento de fonética acústica

    Full text link
    This thesis is based on the following hypothesis: the introduction of direct knowledge from the acoustic-phonetic field to the speech recognition problem, especially in the feature extraction step, may constitute a solid base of analysis for the determination of the behavior and capabilities of those systems and their improvement, as well. Most of the complexity of this Ph.D. thesis comes from the different subjects related with the speech processing área. The application of acoustic-phonetic information to the speech recognition research área implies a deep knowledge of both subjects. The research carried out in this work has been divided in two main parts: analysis of the current feature extraction methods and a study of several possible procedures about the incorporation of phonetic-acoustic knowledge to those systems. Abundant recognition and related quality measure results are presented for 50 different parameter extraction models. Details about the real-time implementation on a DSP platform (TMS3230C31-60) of two different parameter extraction models are presented. Finally, a set of computer tools developed for building and testing new speech recognition systems has been produced. Besides, the application of several results from this work can be extended to other speech processing áreas, such as computer assisted language learning, linguistic rehabilitation, etc.---ABSTRACT---La hipótesis en la que se basa el desarrollo de esta tesis, se centra en la suposición de que la aportación de conocimiento directo, proveniente del campo de la fonética acústica, al problema del reconocimiento automático de la voz, en concreto a la etapa de extracción de características, puede constituir una base sólida con la que poder analizar el comportamiento y capacidad de discriminación de dichos sistemas, así como una forma de mejorar sus prestaciones. Parte de la complejidad que presenta esta tesis doctoral, viene motivada por las diferentes disciplinas que están relacionadas con el área de procesamiento de la voz. La aplicación de información fonética-acústica al campo de investigación del reconocimiento del habla requiere un amplio conocimiento de ambas materias. Las investigaciones desarrolladas en este trabajo se han dividido en dos bloques fundamentales: análisis de los métodos actuales de extracción de rasgos fonéticos y un estudio de algunas posibles formas de incorporación de conocimiento fonético-acústico a dichos sistemas. En esta tesis se ofrecen abundantes resultados relativos a tasas de reconocimiento y medidas acerca de la calidad de este proceso, para un total de 50 modelos de extracción de parámetros. Así mismo se incluyen los detalles de la implementación en tiempo real para una plataforma DSP, en concreto TMS320C31-60, de dos diferentes modelos de extracción de rasgos. Además, se ha desarrollado un conjunto de las herramientas informáticas que pueden servir de base para construir y validar de forma sencilla, nuevos sistemas de reconocimiento. La aplicación de algunos de los resultados del trabajo puede extenderse también a otras áreas del tratamiento de la voz, tales como la enseñanza de una segunda lengua, logopedia, etc

    Improving speaker recognition by biometric voice deconstruction

    Get PDF
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions

    Relevance of the glottal pulse and the vocal tract in gender detection

    Get PDF
    Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers

    Using dysphonic voice to characterize speaker's biometry

    Get PDF
    Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading

    Neuromechanical Modelling of Articulatory Movements from Surface Electromyography and Speech Formants

    Get PDF
    Speech articulation is produced by the movements of muscles in the larynx, pharynx, mouth and face. Therefore speech shows acoustic features as formants which are directly related with neuromotor actions of these muscles. The first two formants are strongly related with jaw and tongue muscular activity. Speech can be used as a simple and ubiquitous signal, easy to record and process, either locally or on e-Health platforms. This fact may open a wide set of applications in the study of functional grading and monitoring neurodegenerative diseases. A relevant question, in this sense, is how far speech correlates and neuromotor actions are related. This preliminary study is intended to find answers to this question by using surface electromyographic recordings on the masseter and the acoustic kinematics related with the first formant. It is shown in the study that relevant correlations can be found among the surface electromyographic activity (dynamic muscle behavior) and the positions and first derivatives of the first formant (kinematic variables related to vertical velocity and acceleration of the joint jaw and tongue biomechanical system). As an application example, it is shown that the probability density function associated to these kinematic variables is more sensitive than classical features as Vowel Space Area (VSA) or Formant Centralization Ratio (FCR) in characterizing neuromotor degeneration in Parkinson's Disease.This work is being funded by Grants TEC2016-77791-C4-4-R from the Ministry of Economic Affairs and Competitiveness of Spain, Teka-Park 55 02 CENIE-0348_CIE_6_E POCTEP (InterReg Programme) and 16-30805A, SIX Research Center (CZ.1.05/2.1.00/03.0072), and LO1401 from the Czech Republic Government

    The Role of Data Analytics in the Assessment of Pathological Speech—A Critical Appraisal

    No full text
    Pathological voice characterization has received increasing attention over the last 20 years. Hundreds of studies have been published showing inventive approaches with very promising findings. Nevertheless, methodological issues might hamper performance assessment trustworthiness. This study reviews some critical aspects regarding data collection and processing, machine learning-oriented methods, and grounding analytical approaches, with a view to embedding developed clinical decision support tools into the diagnosis decision-making process. A set of 26 relevant studies published since 2010 was selected through critical selection criteria and evaluated. The model-driven (MD) or data-driven (DD) character of the selected approaches is deeply examined considering novelty, originality, statistical robustness, trustworthiness, and clinical relevance. It has been found that before 2020 most of the works examined were more aligned with MD approaches, whereas over the last two years a balanced proportion of DD and MD-based studies was found. A total of 15 studies presented MD characters, whereas seven were mainly DD-oriented, and four shared both profiles. Fifteen studies showed exploratory or prospective advanced statistical analysis. Eighteen included some statistical validation to avail claims. Twenty-two reported original work, whereas the remaining four were systematic reviews of others’ work. Clinical relevance and acceptability by voice specialists were found in 14 out of the 26 works commented on. Methodological issues such as detection and classification performance, training and generalization capability, explainability, preservation of semantic load, clinical acceptance, robustness, and development expenses have been identified as major issues in applying machine learning to clinical support systems. Other important aspects to be taken into consideration are trustworthiness, gender-balance issues, and statistical relevance

    The Role of Data Analytics in the Assessment of Pathological Speech—A Critical Appraisal

    No full text
    Pathological voice characterization has received increasing attention over the last 20 years. Hundreds of studies have been published showing inventive approaches with very promising findings. Nevertheless, methodological issues might hamper performance assessment trustworthiness. This study reviews some critical aspects regarding data collection and processing, machine learning-oriented methods, and grounding analytical approaches, with a view to embedding developed clinical decision support tools into the diagnosis decision-making process. A set of 26 relevant studies published since 2010 was selected through critical selection criteria and evaluated. The model-driven (MD) or data-driven (DD) character of the selected approaches is deeply examined considering novelty, originality, statistical robustness, trustworthiness, and clinical relevance. It has been found that before 2020 most of the works examined were more aligned with MD approaches, whereas over the last two years a balanced proportion of DD and MD-based studies was found. A total of 15 studies presented MD characters, whereas seven were mainly DD-oriented, and four shared both profiles. Fifteen studies showed exploratory or prospective advanced statistical analysis. Eighteen included some statistical validation to avail claims. Twenty-two reported original work, whereas the remaining four were systematic reviews of others’ work. Clinical relevance and acceptability by voice specialists were found in 14 out of the 26 works commented on. Methodological issues such as detection and classification performance, training and generalization capability, explainability, preservation of semantic load, clinical acceptance, robustness, and development expenses have been identified as major issues in applying machine learning to clinical support systems. Other important aspects to be taken into consideration are trustworthiness, gender-balance issues, and statistical relevance

    Diagnostic value of articulation neuromotor instability for neurodegenerative disease detection and rating

    No full text
    Neurodegenerative diseases leave an imprint on both phonation and articulation stability. Hypotonia (muscle stiffness), vocal fold unbalance (bi-laterality) and tremor (altered neuromotor feedback) some ways in which the degeneration manifests. Articulatory instability has been less studied, compared to reduced vowel space and vowel centralization. Another relevant aspect is the dynamic behavior of formant transitions in dyadochokinetic exercises, to assess hypokinetic dysarthria in Parkinson?s Disease (PD). The objective of the study is to compare articulation in PD patients against a normative population from kinematic features estimated on formants using Information Theory to quantify their divergence with respect to normative subjects
    corecore